Topic Extraction from News Archive Using TF*PDF Algorithm

نویسندگان

  • Khoo Khyou Bun
  • Mitsuru Ishizuka
چکیده

Busy and no time to digest the news archive .... ? Ever since the Web wide-spreading, the amount of electronically available information online, especially news archive proliferates and threatens to overwhelm human attention. Seeing this, we propose an information system that will extract the main topics in the news archive in a weekly basis. By getting a weekly report, user can know what were the main news events in the past week.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Trees for Topic Detection

Extracting topic keywords from on-line text documents is highly significant in text mining applications. In our work, extracted keywords are represented as a hierarchical topic tree. For this, we basically use incremental clustering technique for incoming online documents. Moreover, we define a cluster-based measure similar to the tfidf measure and a probabilistic inequality to determine subsum...

متن کامل

News Topic Tracking and Re-ranking with Query Expansion Based on Near-Duplicate Detection

Increase of digital storage capacity enabled the creation of large-scale news video archives. To make full use of the archive, it is necessary to grasp the development and dependencies of news stories. Considering this problem, we investigate tracking and re-ranking methodologies of news stories. The archive used as a test-bed consists of more than 30,000 news stories. This paper proposes a nov...

متن کامل

Unsupervised language model adaptation for automatic speech recognition of broadcast news using web 2.0

We improve the automatic speech recognition of broadcast news using paradigms from Web 2.0 to obtain timeand topicrelevant text data for language modeling. We elaborate an unsupervised text collection and decoding strategy that includes crawling appropriate texts from RSS Feeds, complementing it with texts from Twitter, language model and vocabulary adaptation, as well as a 2-pass decoding. The...

متن کامل

Discriminative Features Selection in Text Mining Using TF - IDF Scheme

This paper describes technique for discriminative features selection in Text mining. 'Text mining’ is the discovery of new, previously unknown information, by computer. Discriminative features are the most important keywords or terms inside document collection which describe the informative news included in the document collection. Generated keyword set are used to discover Association Rules am...

متن کامل

Extracting Named Entities Using Named Entity Recognizer and Generating Topics Using Latent Dirichlet Allocation Algorithm for Arabic News Articles

This paper explains for the Arabic language, how to extract named entities and topics from news articles. Due to the lack of high quality tools for Named Entity Recognition (NER) and topic identification for Arabic, we have built an Arabic NER (RenA) and an Arabic topic extraction tool using the popular LDA algorithm (ALDA). NER involves extracting information and identifying types, such as nam...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002